Welcome‎ > ‎

About C-GREx

Chaos Game Representation Explorer is mainly intended to explore Chaos Game Representation. Chaos Game Representation is a new area in Bioinformatics which explore the graphical representation of biological sequences. 

Chaos Game Representation was proposed as a scale-independent representation for genomic sequences by Jeffrey H. J in 1990. The technique, formally an iterative map, can be traced further back to the foundations of statistical mechanics, in particular to Chaos theory. The original proposition has been considerably expanded and generalized for sequences of arbitrary symbols and therefore including other biological sequences such as proteins.

As an analogy, consider, English sentences. They are composed of only 26 alphabets. Likewise, DNA, the genetic material of any living organism is composed of only 4 alphabets, A, G, C, and T, where, A stands for Adenine, G for Guanine, C for cytosine, and T for Thymine. These four are Nitrogen bases (nucleotides) which constitute DNA. For generating the CGR of a DNA sequence, the four nucleotides A, G, C, and T are assigned to the corners of a square. The first point corresponding to the first alphabet in the DNA sequence is plotted halfway between the centre of the square and the corner corresponding to the first alphabet (nucleotide) of the sequence. Successive points are plotted halfway between the previous point and the corner corresponding to the base of each successive nucleotide in the sequence.
 
RNA sequence is composed of A, G, C and ‘U’ instead of ‘T’ in DNA. Hence, the CGR of RNA consists of four nucleotides A, G, C, and U which are assigned to the corners of a square. To plot the CGR of an RNA sequence, the first point corresponding to the first alphabet in the sequence is plotted halfway between the centre of the square and the corner corresponding to the first alphabet (nucleotide) of the sequence and successive points plotted halfway between the previous point and the corner corresponding to the base of each successive nucleotide in the sequence.

A Protein sequence on the other hand is composed of 20 amino acids. CGR for protein sequences can be drawn on polygons with sides ranging from 3 to 20. The amino acids are grouped and assigned to the corners of the polygons using different methods. C-GREx provides provision for the user to select the plot shape and the amino acid grouping strategy. Amino acids can be grouped manually or automatically. For automatic grouping, C-GREx employs k-means clustering on any of the 77 physico-chemical parameters available in its menu.
 
Every point of a CGR is a representation of sequence up to that position. There is a one to one correspondence between the subsequences and the points in the CGR. Since a base is always plotted in its quadrant, any sequence will always be plotted somewhere in the quadrant of its last base, and conversely any two points in the same quadrant must have the same last base. CGR possess lot of properties that make it sequence analyst’s favorite tool.

C-GREx is a handy tool for those who are in sequence visualization and analysis of patterns in biological sequences. C-GRex packs a wide variety of exploration facilities using Chaos Game Representation in DNA, RNA and amino acid sequences